7 research outputs found
ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference
Whereas the ability of deep networks to produce useful predictions on many
kinds of data has been amply demonstrated, estimating the reliability of these
predictions remains challenging. Sampling approaches such as MC-Dropout and
Deep Ensembles have emerged as the most popular ones for this purpose.
Unfortunately, they require many forward passes at inference time, which slows
them down. Sampling-free approaches can be faster but suffer from other
drawbacks, such as lower reliability of uncertainty estimates, difficulty of
use, and limited applicability to different types of tasks and data.
In this work, we introduce a sampling-free approach that is generic and easy
to deploy, while producing reliable uncertainty estimates on par with
state-of-the-art methods at a significantly lower computational cost. It is
predicated on training the network to produce the same output with and without
additional information about that output. At inference time, when no prior
information is given, we use the network's own prediction as the additional
information. We prove that the difference between the two predictions is an
accurate uncertainty estimate and demonstrate our approach on various types of
tasks and applications
PartAL: Efficient Partial Active Learning in Multi-Task Visual Settings
Multi-task learning is central to many real-world applications.
Unfortunately, obtaining labelled data for all tasks is time-consuming,
challenging, and expensive. Active Learning (AL) can be used to reduce this
burden. Existing techniques typically involve picking images to be annotated
and providing annotations for all tasks.
In this paper, we show that it is more effective to select not only the
images to be annotated but also a subset of tasks for which to provide
annotations at each AL iteration. Furthermore, the annotations that are
provided can be used to guess pseudo-labels for the tasks that remain
unannotated. We demonstrate the effectiveness of our approach on several
popular multi-task datasets
Double Refinement Network for Efficient Indoor Monocular Depth Estimation
Monocular depth estimation is the task of obtaining a measure of distance for
each pixel using a single image. It is an important problem in computer vision
and is usually solved using neural networks. Though recent works in this area
have shown significant improvement in accuracy, the state-of-the-art methods
tend to require massive amounts of memory and time to process an image. The
main purpose of this work is to improve the performance of the latest solutions
with no decrease in accuracy. To this end, we introduce the Double Refinement
Network architecture. The proposed method achieves state-of-the-art results on
the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second
rate is significantly higher (up to 18 times speedup per image at batch size 1)
and the RAM usage per image is lower
Masksembles for Uncertainty Estimation
Deep neural networks have amply demonstrated their prowess but estimating the
reliability of their predictions remains challenging. Deep Ensembles are widely
considered as being one of the best methods for generating uncertainty
estimates but are very expensive to train and evaluate. MC-Dropout is another
popular alternative, which is less expensive, but also less reliable. Our
central intuition is that there is a continuous spectrum of ensemble-like
models of which MC-Dropout and Deep Ensembles are extreme examples. The first
uses an effectively infinite number of highly correlated models while the
second relies on a finite number of independent models.
To combine the benefits of both, we introduce Masksembles. Instead of
randomly dropping parts of the network as in MC-dropout, Masksemble relies on a
fixed number of binary masks, which are parameterized in a way that allows to
change correlations between individual models. Namely, by controlling the
overlap between the masks and their density one can choose the optimal
configuration for the task at hand. This leads to a simple and easy to
implement method with performance on par with Ensembles at a fraction of the
cost. We experimentally validate Masksembles on two widely used datasets,
CIFAR10 and ImageNet
How to Boost Face Recognition with StyleGAN?
State-of-the-art face recognition systems require vast amounts of labeled
training data. Given the priority of privacy in face recognition applications,
the data is limited to celebrity web crawls, which have issues such as limited
numbers of identities. On the other hand, self-supervised revolution in the
industry motivates research on the adaptation of related techniques to facial
recognition. One of the most popular practical tricks is to augment the dataset
by the samples drawn from generative models while preserving the identity. We
show that a simple approach based on fine-tuning pSp encoder for StyleGAN
allows us to improve upon the state-of-the-art facial recognition and performs
better compared to training on synthetic face identities. We also collect
large-scale unlabeled datasets with controllable ethnic constitution --
AfricanFaceSet-5M (5 million images of different people) and AsianFaceSet-3M (3
million images of different people) -- and we show that pretraining on each of
them improves recognition of the respective ethnicities (as well as others),
while combining all unlabeled datasets results in the biggest performance
increase. Our self-supervised strategy is the most useful with limited amounts
of labeled training data, which can be beneficial for more tailored face
recognition tasks and when facing privacy concerns. Evaluation is based on a
standard RFW dataset and a new large-scale RB-WebFace benchmark. The code and
data are made publicly available at
https://github.com/seva100/stylegan-for-facerec.Comment: 16 pages, 9 figures, 11 tables; accepted to ICCV 202
Double Refinement Network for Efficient Monocular Depth Estimation
Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose of this work is to improve the performance of the latest solutions with no decrease in accuracy. To this end, we introduce the Double Refinement Network architecture. The proposed method achieves state-of-the-art results on the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second rate is significantly higher (up to 18 times speedup per image at batch size 1) and the RAM usage is lower